Transition Entropy in Partially Observable Markov Decision Processes
نویسندگان
چکیده
This paper proposes a new heuristic algorithm suitable for real-time applications using partially observable Markov decision processes (POMDP). The algorithm is based in a reward shaping strategy which includes entropy information in the reward structure of a fully observable Markov decision process (MDP). This strategy, as illustrated by the presented results, exhibits near-optimal performance in all examples tested.
منابع مشابه
A POMDP Framework to Find Optimal Inspection and Maintenance Policies via Availability and Profit Maximization for Manufacturing Systems
Maintenance can be the factor of either increasing or decreasing system's availability, so it is valuable work to evaluate a maintenance policy from cost and availability point of view, simultaneously and according to decision maker's priorities. This study proposes a Partially Observable Markov Decision Process (POMDP) framework for a partially observable and stochastically deteriorating syste...
متن کاملThe Use of Transition Entropy in Partially Observable Markov Decision Processes1
In this report we describe a new POMDP algorithm, denoted TEQ-MDP, which computes the optimal policy of a modified MDP and uses the obtained optimal solution to compute the action for the POMDP, as a function of the belief-state. The modified MDP includes state entropy information (transition entropy) in its reward structure so as to value actions that gather information. This algorithm is suit...
متن کاملMultiple-Environment Markov Decision Processes
We introduce Multi-Environment Markov Decision Processes (MEMDPs) which are MDPs with a set of probabilistic transition functions. The goal in a MEMDP is to synthesize a single controller with guaranteed performances against all environments even though the environment is unknown a priori. While MEMDPs can be seen as a special class of partially observable MDPs, we show that several verificatio...
متن کاملThe Duality of State and Observation in Probabilistic Transition Systems
In this paper we consider the problem of representing and reasoning about systems, especially probabilistic systems, with hidden state. We consider transition systems where the state is not completely visible to an outside observer. Instead, there are observables that partly identify the state. We show that one can interchange the notions of state and observation and obtain what we call a dual ...
متن کاملProducing efficient error-bounded solutions for transition independent decentralized mdps
There has been substantial progress on algorithms for single-agent sequential decision making using partially observable Markov decision processes (POMDPs). A number of efficient algorithms for solving POMDPs share two desirable properties: error-bounds and fast convergence rates. Despite significant efforts, no algorithms for solving decentralized POMDPs benefit from these properties, leading ...
متن کامل